Fried Chicken Pricing

1. The beginning

For the record, I became a pescetarian (vegetarian + seafood) for some years, but I get tempted sometimes to sin. At a yelp-highly-rated chicken place this summer, I was waiting for my order, which took forever (exaggerated). Without any other better things to do, I re-studied the menu there, trying to understand the pricing structure. See photo:
image

So, I was wondering to myself tons of questions: how can I get the best deal? What is the cost per wing, and cost per drum? Does the restaurant use a formula to determine the pricing? What if I add a sin-ful parameter as a penalty term, how much should I order next time to minimize my sin (Root-Mean-Squared-Sinfulness)?

Dang! I am cursed. Shouldn’t have come here, I thought…

You can see the set prices clearly. We will need to re-format our data, and we can practice our Markdown skill at the same time to tabulate them here:

Wings Drums Price
5 0 6.69
10 0 12.99
15 0 17.99
20 0 23.99
40 0 45.99
0 3 6.69
0 5 10.99
0 10 19.99
0 15 29.99
3 2 7.99
7 4 16.99
12 6 27.99
20 9 43.99

Let us now enter these into a dataframe in R. I will echo out the codes, but in general, it’s better not to unless you have something to show the audience in your codes.

# fried-chicken-price
fcp <- data.frame(wing=c(NA,10,15,20,40,0,0,0,0,3,7,12,20), drum=c(0,0,0,0,0,3,5,10,15,2,4,6,9), price=c(6.69,12.99,17.99,23.99,45.99,6.69,10.99,19.99,29.99,7.99,16.99,27.99,43.99) )

So there are 13 data points in our dataset. (Notice the use of inline R codes here.)

2. Exploratory Data Analysis (EDA)

There are a bit of things we typically look at for EDA.

  1. Basic statistics
    • mean, s.d., median, range (four spaces at the start of line for proper sub-list indentation)
  2. Simple correlations and tests
    • correlation matrix if applicable
    • z-test, t-test, anova test if applicable
    • chi-squared test if applicable
  3. Normality
    • QQ-plot
    • boxplot
    • histogram
    • Shapiro-Wilk test

2.1 Basic statistics

str(fcp)
## 'data.frame':    13 obs. of  3 variables:
##  $ wing : num  NA 10 15 20 40 0 0 0 0 3 ...
##  $ drum : num  0 0 0 0 0 3 5 10 15 2 ...
##  $ price: num  6.69 12.99 17.99 23.99 45.99 ...
summary(fcp)
##       wing            drum            price      
##  Min.   : 0.00   Min.   : 0.000   Min.   : 6.69  
##  1st Qu.: 0.00   1st Qu.: 0.000   1st Qu.:10.99  
##  Median : 8.50   Median : 3.000   Median :17.99  
##  Mean   :10.58   Mean   : 4.154   Mean   :20.94  
##  3rd Qu.:16.25   3rd Qu.: 6.000   3rd Qu.:27.99  
##  Max.   :40.00   Max.   :15.000   Max.   :45.99  
##  NA's   :1

Upon inspection, we can see immediately there is a “NA” entry for wing. That’s a mistake. Have an action plan to fix it. After we fix it, re-run the EDA.

fcp[1,1]=5  # equal sign "=" and assignment operator "<-" are interchangeable in R
str(fcp)
## 'data.frame':    13 obs. of  3 variables:
##  $ wing : num  5 10 15 20 40 0 0 0 0 3 ...
##  $ drum : num  0 0 0 0 0 3 5 10 15 2 ...
##  $ price: num  6.69 12.99 17.99 23.99 45.99 ...
summary(fcp)
##       wing            drum            price      
##  Min.   : 0.00   Min.   : 0.000   Min.   : 6.69  
##  1st Qu.: 0.00   1st Qu.: 0.000   1st Qu.:10.99  
##  Median : 7.00   Median : 3.000   Median :17.99  
##  Mean   :10.15   Mean   : 4.154   Mean   :20.94  
##  3rd Qu.:15.00   3rd Qu.: 6.000   3rd Qu.:27.99  
##  Max.   :40.00   Max.   :15.000   Max.   :45.99

2.1 Tests (Correlation, ANOVA, …)

Since the features/variables are all numerical (quantitative), it makes most sense to check their linear correlations.

loadPkg("corrplot")
corrmatrix = cor(fcp)  # more detailed pair-wise correlation test can be obtained from cor.test(fcp$wing,fcp$price) etc
corrplot.mixed(corrmatrix)

It is good to see that price has a decent correlation with both wing and drum, with wing seems to have a higher correlation. It is also great that drum and wing do not have too strong a correlation between them. Imagine if each drum I buy, I always will be thrown with 2 dog bones, I will have a hard time knowing if I am really paying for the drum or the dog bones. This is called a problem of collinearity. (We’ll check with VIF.)

Normality check/test

It is usually best if all the variables (numerical ones) are normally distributed. This usually does not happen, but we would still like to know overall the distribution is bell-shaped, and not too awkward.

2.3.1 QQ-plot

We can first check the QQ-plots for each variable by itself. A straight line means normal distribution.

qqnorm(fcp$price, main = "price Q-Q Plot") 
qqline(fcp$price)

qqnorm(fcp$wing, main = "wing Q-Q Plot") 
qqline(fcp$wing)

qqnorm(fcp$drum, main = "drum Q-Q Plot") 
qqline(fcp$drum)

Not too well of straight line fits, but with only 13 data points, it’s probably okay.

2.3.2 Boxplot

Next we can check the boxplots for a rough visual.

boxplot(fcp, col=c("red","blue","green"), ylab="count or price($)")
axis(side = 4, ylab="price($)")

Same conclusion: they do not look like normal, but we’ll take it.

Histogram

Now histograms:

hist(fcp$price)

hist(fcp$wing)

hist(fcp$drum)

Shapiro-Wilk test

And finally, using shapiro-wilk test:

shapiro.test(fcp$price)
## 
##  Shapiro-Wilk normality test
## 
## data:  fcp$price
## W = 0.89892, p-value = 0.1293
shapiro.test(fcp$wing)
## 
##  Shapiro-Wilk normality test
## 
## data:  fcp$wing
## W = 0.83692, p-value = 0.01936
shapiro.test(fcp$drum)
## 
##  Shapiro-Wilk normality test
## 
## data:  fcp$drum
## W = 0.85207, p-value = 0.03033

We’ll learn to interpret these soon.

Linear Model

Now if those passed the smell test, we can try build some models. Linear model is first to come to mind. We will model the price with wing and drum as the two independent variables (also called features).

chicklm = lm(price ~ ., data=fcp)
summary(chicklm)
## 
## Call:
## lm(formula = price ~ ., data = fcp)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.30661 -0.30051 -0.04708  0.15236  1.84476 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.77755    0.50378   1.543    0.154    
## wing         1.16298    0.02546  45.670 6.10e-13 ***
## drum         2.01202    0.06202  32.441 1.83e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9386 on 10 degrees of freedom
## Multiple R-squared:  0.9957, Adjusted R-squared:  0.9949 
## F-statistic:  1165 on 2 and 10 DF,  p-value: 1.426e-12

As we can see, the R\(^2\) (using \(\LaTeX\) formatting here) value of the linear model is 0.9957, which shows the prices are set with rather strict per-piece-structure.

If we use the (default 95%) confidence intervals of the coefficients as shown here:

coeffconfint = confint.lm(chicklm) 
# or just 
# confint(chicklm)
coeffconfint
##                  2.5 %   97.5 %
## (Intercept) -0.3449566 1.900048
## wing         1.1062375 1.219715
## drum         1.8738264 2.150211

Findings

We find that:

  1. Each piece of wing cost about $1.11 to $1.22
  2. Each piece of drum stick cost about $1.87 to $2.15
  3. The intercept of about $-0.345 to $1.9, or average of $0.78 probably represent the base per-order or box/bag charge.

Check:
We should always check for multi-collinearity in linear models.

loadPkg("faraway") # faraway library is one of them has a vif function
vif(chicklm)
##     wing     drum 
## 1.186424 1.186424

With vif values less than 5, we can safely conclude there is not much collinearity concerns in the dataset.

Conclusion

So what is the best deal???

Let me compare the predicted values from the model with the actual values, or look at the residuals.

fcp$fitval = chicklm$fitted.values
fcp$residuals = chicklm$residuals
fcp
##    wing drum price    fitval   residuals
## 1     5    0  6.69  6.592428  0.09757171
## 2    10    0 12.99 12.407311  0.58268922
## 3    15    0 17.99 18.222193 -0.23219327
## 4    20    0 23.99 24.037076 -0.04707576
## 5    40    0 45.99 47.296606 -1.30660571
## 6     0    3  6.69  6.813602 -0.12360174
## 7     0    5 10.99 10.837639  0.15236097
## 8     0   10 19.99 20.897732 -0.90773226
## 9     0   15 29.99 30.957825 -0.96782549
## 10    3    2  7.99  8.290513 -0.30051259
## 11    7    4 16.99 16.966456  0.02354413
## 12   12    6 27.99 26.805376  1.18462435
## 13   20    9 43.99 42.145244  1.84475643

Ah, the residual is most negative with the 40-wing combo at -$1.31. I should go for that and kill myself.

But if I am wiser, and insist on having a one-person portion:

  1. 5-piece wing is at +$0.10 (paying a dime too much!) - Not good.
  2. 3-piece drum is at -$0.12, saving a dozen pennies, sounds good.
  3. The 3-wing-2-drum deal (at -$0.30) is the best for me!

Case closed. Mic dropped.

Oh wait… need to include the sin penalty…

And the wings there were really really good.